Document page classification algorithms in low-end copy pipeline
نویسندگان
چکیده
bstract. We develop real-time, low-complexity image classificaion algorithms suitable for a copy mode selector embedded in a ow-end copier. The algorithms classify scanned images repreented in RGB or in an opponent color space. Classes are the eight ombinations of mono/color and text/mix/picture/photo. Classificaion is 30–98% accurate with misclassifications tending to be beign. The algorithms provide for improved copy quality, a simplified ser interface, and increased copy rate. © 2008 SPIE and IS&T. DOI: 10.1117/1.3010879
منابع مشابه
DOCUMENT PAGE CLASSIFICATION AND NONLINEAR DIFFUSION FILTERING FOR IMAGE SEGMENTATION AND NOISE REMOVAL A Dissertation
Dong, Xiaogang Ph.D., Purdue University, May, 2007. Document Page Classification and Nonlinear Diffusion Filtering for Image Segmentation and Noise Removal. Major Professor: Ilya Pollak. We develop a real-time, strip-based, low-complexity document page classification algorithm, which can be used as a copy mode selector in the copy pipeline. It analyzes the scan images and classifies them into o...
متن کاملPersian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کاملAutomatic Web Page Classification
Aim of this paper is to describe a method of automatic web page classification to semantic domains and its evaluation. The classification method exploits machine learning algorithms and several morphological as well as semantical text processing tools. In contrast to general text document classification, in the web document classification there are often problems with short web pages. In this p...
متن کاملA Patient-Centric SNV-CNV Pipeline
This application note outlines the Illumina methodology for estimating DNA copy number for data produced on Affymetrix Genome Wide Human single nucleotide polymorphism (SNP) 5.0 and 6.0 arrays on the BaseSpace Correlation Engine. Within a patient-centric context, data are obtained for an individual patient rather than a batch. Also, patient data are often supplied without a matching reference. ...
متن کاملNoise reduction through summarization for Web-page classification
Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the perfor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Electronic Imaging
دوره 17 شماره
صفحات -
تاریخ انتشار 2008